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This paper is focused on the application of principal component analysis 
(PCA) to classify and localize power system faults in a three phase, radial, 
long transmission line using receiving end line currents taken almost at 
the midpoint of the line length. The PCA scores are analyzed to compute 
principal component distance index (PCDI) which is further analyzed using 
a ratio based analysis to develop ratio index matrix (R) and ratio error matrix 
(RE) and ratio error index (REI) which are used to develop a fault classifier, 
which produces a 100% correct prediction. The later part of the paper 
deals with the development of a fault localizer using the same PCDI 
corresponding to six intermediate training locations, which are analyzed with 
tool like multiple linear regression (MLR) in order to predict the fault 
location with significantly high accuracy of only 87 m for a 150 km long 
radial transmission line. 
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1. INTRODUCTION 

Electrical power transmission system is one of the most spatially extended technical systems, 
directly exposed to the environment and fairly often subjected to atmospheric hazards leading to different 
types of faults. Hence, power system stability, reliability, protection as well as regulated power flow has been 
prime topics of research. Identification and classification and localization of faults have been under in depth 
research since long. Prediction of fault location, especially in long transmission systems with high and very 
high voltage and large power systems is one of the most challenging works in the research area for 
the development of a robust power system protection algorithm. Hence, prompt detection of faults 
and classification along with precise fault location determination has been practiced by scientists in order to 
ensure system safety and stability. Supervised learning algorithms like the artificial neural network (ANN) 
along with probabilistic neural network (PNN) have a great impact in the area of identification 
and localization of the fault [1-4]. ANN sometimes is combined with other topologies like fuzzy logic in fault 
treatment [5]. Wavelet transformation and wavelet entropy has been extensively implemented successfully in 
fault analysis [6-7]. Wavelet transformation has often been combined with other methods like Adaptive neuro 
fuzzy inference system [8], genetic algorithm (GA) [9], principal component analysis (PCA) [10] etc. 

Other analytical techniques include support vector machines which have significant contribution to 
the design of power system protection algorithm [11-12], Dynamic phasors is another approach used for 
the analysis of faults in power system [13]. Principal component analysis (PCA), on the other hand, is a 
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usefulstatistical and analytical technique for multivariate statistical analysis. It reduces multi a dimensional 
data set to a set of directions, called principal components (PC), in the decreasing order of importance, 
retaining variability of the data and their mutual variations, highlighting broadly the similarities and differences 
[14-17]. Hence, PCA has been used extensively in power system analysis, especially in fault detection, 
classification and distance prediction where multiple dimensional data are obtained regarding voltage, 
current, power, frequency etc and/or a combination of these parameters [18-22]. Hence, fault analysis 
becomes an important issue in power system research. Classification and detection of fault is thus absolutely 
essential to save vital time and effort of the working personnel. Close approximation of the fault location 
makes it easier to detect and remove of the fault. Hence, faults are required to be detected fast, and located 
accurately to restore normal power flow at the earliest. 

The proposed work is intended to develop a simple PCA based power system protection algorithm 
suitable for the classification and localization of different types of power system faults in a three phase radial 
long transmission system, using pattern indices and fault signatures developed by application of PCA, 
leading to the development of principal component distance index (PCDI) [22]. Similarity analysis of 
the PCDI based fault signatures identifies the maximum proximity of the test data with any of the fault 
prototypes using minimum square error (MSE) criteria, thus classifying the unknown fault. Fault localization 
has been carried out using statistical analysis like multiple linear regression (MLR) [23] over the three phase 
PCDI. This helps in developing a general regression model which is further used to test unknown data for 
fault localization. A transmission line prototype has been modeled in EMTP-ATP simulation [24], followed 
by analysis of quarter cycle pre-fault and half cycle post-fault receiving end current waveforms in 
the MATLAB environment, using ten different fault prototypes conducted at varying fault locations along 
the line span and healthy condition, using PCA based proposed protection scheme 


2. SYSTEM DESIGN 

A single end fed 400 kV, 150 km long, single circuit, three phase, radial, overhead transposed AC 
transmission line has been designed in electromagnetic transient programming (EMTP) joining fifteen three phase 
line cable constants (LCC) blocks, each of 10 km in cascade and is shown in Figure 1. Ten different types of faults 
viz., SLG-A, SLG-B, SLG-C, DL-AB, DL-BC, DL-CA, DLG-AB, DLG-BC, DLG-CA, and LLL have been 
conducted at fifteen different locations, 10 km apart, throughout the entire length of 150 km. The resistance 
and inductance of the bundled line are taken as 0.0585 O/km (DC resistance) and 0.2 mH/km. The inter-spacing 
of the three lines is kept at of 17.5 cm between two adjacent horizontal lines. 


3 Phase 
AC Source 


%. X0005 

"K2>^- 


S ending 
End' 




Figure 1. Simulation model of the radial, single end fed, long transmission line 


3. DATA PREPARATION 

The proposed algorithm is trained with only one set of training fault data of ten different types 
of faults conducted at almost the midpoint of the line, i.e., at 70 km from sending end of the 150 km long 
transmission line and healthy condition data. Quarter cycle pre-fault and half cycle post-fault receiving end 
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line fault current is collected at a sampling frequency of 10 kHz, i.e., 2000 samples/cycle, thus the sample 
vector becomes an array containing 1500 data points for each type and for each phase. Hence, the three 
phases training data matrix corresponding to each fault type and carried out at 70 km, takes the dimension 
of 1500x3, i.e., with 1500 rows and 3 columns, for one type of training fault and illustrated as: 

Xi = [la,- 1 lb, i Ic, i; la,- 2 lb ,-2 Ic,la; 1500; 1500 Icusoo] 1500x3 

where la, lb and Ic are receiving end line currents under fault condition for three different phase and i is 
the index defining fault prototype, which counts up to twelve. i= 1 represents healthy or no fault condition, 
i= 2 to 11 represent ten different fault prototypes and, 1=12 stands for the test data or unknown type. 
Hence, the total data matrix takes the dimension [1500x(3xl2)] i.e. 1500x36. Hence, 

X = [X, X 2 X3...X12] 1500x36 

Further modification has been carried out by grouping the individual phases to construct three individual 
phase identified matrices [Xa], [Xb], and [Xc] given by: 

Xai = [ Iai 1 Ia 2 i... Iai 2 i; 

Iai 2 Ia 2 2 ... Iai 2 2 ; 

la 1 1500 Ia 2 1500 ... Iai 2 150o] 1500x12 

Xbi and Xci are also constructed in a similar way, each of which is a 1500x12 matrix Hence the modified 
data matrix takes the form as: 

Xi = [Xai Xbi Xci] 1500x36 

[Xa], [Xb] and [Xc] for each protype i, are processed by the PC A algorithm separately to obtain a the PC A 
scores of each of phase individually, hence producing a Principal Component Distance Indices (PCDI) matrix 
of the order 1x3. Twelve such prototypes are analyzed in sequence to form the complete PCDI matrix of 
the dimension 12x3, denoted by P, each row of which correspond to the twelve fault cases and test condition 
and each column represents three individual phases. As mentioned before, PCA reconstructs a data set in 
the ascending order of importance and for the sake of ease of analysis, only two most important directions 
(PCs) and the corresponding score data are considered for the present purpose, hence used to construct PCDI 
matrix. These PCDI values are approximate estimation of the deviation of each fault current from healthy 
condition. The directions of variation is given by the eigenvectors obtained from the covariance matrix of 
the transformed data points or scores and the magnitudes of deviation from the origin (origin is assigned to 
the no fault condition) are given by the corresponding eigenvalues. 


4. PCA ALGORITHM 
4.1. Generalized PCA algorithm 

Input: N x d data matrix X (each row contain a d dimensional data point) 

1 N 

1 y 1 (i) 

- Comput mean : ju — — 2_, x 

- Subtract mean from rows of X: X = X — jU 

- Compute covariance matrix: — X 7 X 

- Calculate eigenvalues and eigenvectors of ^ 

Pick few eigenvectors (d’<d) corresponding to the largest eigenvalues and put them in the column of 
A in descending order of eigenvalues i.e. A = [V 1 , V 0 ,....V ■■ \ , where Vi, V 2 are the 1 st , 2 nd PCs and so on. 

- Compute the new data matrix (PC scores) in reduced dimension: X = A 1 X 
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4.2. PCA algorithm applied for the proposed work 

Step 1: Assign input data: the input data is taken as the phase identified matrices: [Xa], [Xb], and [Xc] each 
phase data in computed individually, say, denoted by jk where k takes the indices a,b, and c. 

1 N 

Step 2: Comput mean of Xkfor each of the columns individually as: (JJk)n = [ _V j{n) { " ] • where i indexes 

N i 

rows and takes the values 1 to 1500 and n indexes columns and takes the values 1 to 12. 

Step 3: Subtract mean of each corresponding column from each of the rows of Jk for each column 

independently to form the modified joint matrix as: Jk mod «=Jk n~(f^k)n- Hence the dimension retains 
the same as 1500x12. 



Step 5: Calculate eigenvalues and eigenvectors of ^ 


Step 6: Pick few eigenvectors (d’ <d) corresponding to the largest eigenvalues and put them in the column 
of A in descending order of eigenvalues i.e. A = [V 1 ,V 1 , _.V .], where Vi, V 2 are the 1 st , 2 nd PCs 

and so on. For the proposed work, we have taken only the two largest eigenvector, hence, V 1 and\C. 
Step 7 . Compute the new data matrix (PC scoies) in reduced dimension. Jk mod n (new) — A T J k mod n _ Hence, 

the dimension of the score matrix Jr mod « (new) should become the same as 12x1500. The proposed 
work uses only the two most significant directions. Hence, the working .1 k mod n (new) dimension reduces 
to a 12x2 which acts as the PC score matrix for the proposed work. 

Step 8: Forming PCDI matrix: PCA distance is formed by finding out the vector distance of each 
of the training and the test score (2D) from the no-fault score (2D) which is the origin, thus forming 
PCDI matrix for each phase and producing 12x1 PCDI vector for each phase and the total PCDI 12x3 
matrix considering all the three phases, say, denoted by S 12 x 3 . The top eleven rows of S correspond 
to the eleven different training conditions and each column represents the three individual phases 
and the twelfth row indicates that of the test condition, given by, 

S=[PCDI-Aj PCDI-Bi PCDI-C i] 12x3 

where i=l to 12 in the sequence as NO-FLT (healthy), SLG-A, SLG-B, SLG-C, DL-AB, DL-BC, DL-CA, 
DLG-AB, DLG-BC, DLG-CA, LLL, and TEST. The proposed algorithm discussed so far and the formation 
of PCDI follows the flowchart as given in Figure 2. 



Figure 2. Flowchart illustrating the formation of PCDI matrix 
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Further, the total PCDI matrix S is segmented into two matrices, viz. training PCDI matrix (denoted 
by P) and test PCDI matrix or vector (denoted by Q), hence reducing the 12x3 matrix into two matrices as 
given here: 


P,=[PCDI-A, PCDI-B, PCDI-C,]iix3 


Q,=[PCDI-Atest PCDI-B TEST PCDI-CtESt] 1x3 


S=|P;Q]i2*3 


Further, similarity analysis has been carried out in order to compare the experimental data 
(Q vector) with the training fault signatures (P matrix) for each individual phases and find out the maximum 
similarity with any of the eleven different patterns, thus classifying the fault. It is observed that PCDIs vary 
following a certain pattern when computed for faults conducted at increasing geometric distance from 
the sending end, but the pattern for the three individual phases of PCDI remain identical. E.g., for DF-AB 
fault, the magnitudes of PCDI of phase A and B are very high, in comparison to phase C which remains 
almost zero for being the undisturbed phase. This pattern remains almost the same even with changing fault 
location. Besides, this rate of change in magnitude of PCDI of each phase is very much identical with 
increasing or decreasing fault locations. In order to establish the above inference mathematically, the PCDI 
of each phase is divided with the PCDI of the other phase which should remain almost the same regardless 
of the geometric distance of the fault as all the 3 phase PCDI vary almost in the same ratio on changing fault 
distances. The 3D Ratio Matrix (R) is hence formed using the 3D PCDI vector thus formed for each type 
of training fault and the test data, the elements of which are formed as follows [22]: 

[R]=[(PCDI-A,/PCDI-B,) (PCDI-B, /PCDI-C,) (PCDI-C,/PCDI-A,)] i 2 x 3 
=[Ratio 1,Ratio 2,Ratio 3,]nx3 

where i represent the same indexing pattern. It is to be noted here that for a no-fault condition, PCDI of all 
the phases are zero and assigned as origin. Hence no-fault condition is identified by comparison of the PCDI 
directly with a very low constant value as mentioned later and the rest are used for the ratio analysis purpose. 
[R] is further segmented into training and test matrices, as given by: 


[RatioTRAiNiNo] iox3= [R] (, = 2 to i i)x3 and 
[RatiOTEST] 1x3=[R](, -12)x3 


The [Ratio TEST]vector will be similar to any of the ten fault prototypes defines by the ten rows 
of [Ratio training] -In order to model this inference mathematically, a 3D ratio error matrix (RE) is formed 
using the [Ratio T RAiNiNo] and [Ratio T EST] as: 


[RE]=[(Ratio TRAINING 1 i -Ratio TEST l) (Ratio TRAINING 2 i -Ratio TEST 2 ) (Ratio TRAINING 3 ,' -Ratio test 3 )] l 1x3 


Finally, a column vector of ratio error index (REI) is found comparing the ratio error values of each type, 
the elements of which is given as: 

Ratio error index (REI),=Ratio Error 1,+Ratio Error 2,+Ratio Error 3,; 

Quite understandably, the [REI], will be minimum when the test and the corresponding training pattern 
match identically and this matrix is used to classify the fault by identifying the index i with 
the minimum possible REI value.Apart from these, two other threshold values £1 and £2 are selected, one being 
the upper threshold and the other being the lower one, based on the test data set found. The no fault condition is 
detected by direct comparison PCDI summation of the test data with the lower threshold as follows: 


PCDItest sum=PCDI-A test+PCDI-B test+PCDI-C test; 


If PCDItest sum is less than the lower threshold £ 1 , it is identified as no fault due to the absence 
of any major disturbance in any phase, thus detecting no-fault. On the other way, a fault is detected for the 
same PCDItest sum being higher than £ 1 . DL faults are similarly found by comparing the ratio error index 
with that of the upper threshold £ 2 followed by direct analysis of [PCDI] and [R], The entire analysis is well 
understood from the case study discussed in the next section. It is further observed that for DL faults, 
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the directly unaffected phase remains almost undisturbed, whereas in case of DLG faults, some disturbance 
occur even in the directly unaffected line due to the involvement of ground and flow of zero sequence current 
through the ground and the grounded neutral of the transmission system, thus making a differentiation 
between the two types very clear. This inference is also observed from the PCDI matrix discussed in the case 
study model. Figure 3 elaborates the proposed algorithm in detail. 





Training and Test Data set collected 


I 


PC A carried out to find out PC A Scores 


T 


3D PCA Distance matrix [PCDI] formed 


PCDIjest sum 
Computed 


3D Ratio mati 

'ix [R] formed 

1 

3D Ratio Error m 

atrix [RE] formed 



Ratio Error Index 
[REI] Computed 


Yes 

No Fault has 


Occurred 


[ Fault Detected ] 



' Compare [R] and 
[RE] to Classsify 


' Compare [R] and 
[RE] to Classsify , 


Fault Class 
Obtained 


Figure 3. Flowchart of the proposed PCA based fault classifier algorithm 


5. RESULTS AND ANALYSIS 

A sample data set for any arbitrary fault is taken here for the purpose of case study and the same 
is processed through the proposed PCA algorithm to produce PCDI matrix as shown in the initial columns 
of Table 1 which is a combined view of the [PCDI], [R], and [RE]. The [PCDI] is further represented 
graphically in the form of a three dimensional plot in Figure 4.Close observation of Figure 4 reveals that 
the PCI vector of the unknown type i.e. legend 9 is closest to the SLG-BG fault i.e., legend 3 compared to 
any other type with minimum Euclidian distance, which is further ascertained by forming [R] as shown in 
the middle columns of the same Table 1. [R] is again represented graphically in Figure 5.Close observation 
of [PCDI] and [R] reveal a certain distinguishing feature for each particular type of fault, i.e., the test fault 
PCDI values closely resemble that of SLG-B and this similarity is further boosted from Ratio values 
of the same, marked in bold letters. The same is observed from Figure 5 as well where the Euclidian distance 
between legend 3 and 9 is much less as compared to the same in Figure 4, thus ascertaining the test pattern to 
be SLG-B with MSE criteria. Thus, formation of R greatly emphasizes on the similarity between the test data 
and any one of the eleven sets of fault prototypes and this is also tested with varying fault location. 
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Table 1. PCDI, ratio matrix and ratio error matrix formed from the dataset 


Fault type 


[PCDI] 



[R] 



[RE] 


Ratio error 


PCDIA 

PCDI-B 

PCDI-C 

Ratio 1 

Ratio 2 

Ratio 3 

Ratio 
error 1 

Ratio 
error 2 

Ratio 
error 3 

index 

(REI) 

HEALTHY 

0 

0 

0 

NaN 

NaN 

NaN 

NA 

NA 

NA 

NA 

SLG-A 

16.10 

3.51 

3.50 

4.59 

1.00 

0.22 

4.325 

2.781 

0.782 

7.888 

SLG-B 

3.87 

14.87 

3.87 

0.26 

3.84 

1.00 

0.004 

0.058 

0.001 

0.063 

SLG-C 

4.31 

4.31 

14.66 

1.00 

0.29 

3.40 

0.737 

3.489 

2.399 

6.625 

DL-AB 

13.99 

13.10 

3.6E-15 

0.99 

3.9E+15 

2.6E-16 

0.735 

3.877e+15 

0.999 

3.877E+15 

DL-BC 

3.6E-15 

11.44 

11.43 

3.1E-16 

1.00 

3.2E+15 

0.265 

2.782 

3.165e+15 

3.165E+15 

DL-CA 

13.95 

2E-15 

13.95 

6.9E+15 

1.5E-16 

1.00 

6.839e+15 

3.783 

0.001 

6.84E+15 

DLG-AB 

15.27 

16.58 

3.55 

0.92 

4.67 

0.23 

0.657 

0.882 

0.767 

2.306 

DLG-BC 

2.86 

12.46 

15.14 

0.23 

0.82 

5.29 

0.035 

2.960 

4.288 

7.283 

DLG-CA 

17.97 

3.18 

13.55 

5.66 

0.23 

0.75 

5.089 

3.548 

0.202 

8.839 

LLL 

17.00 

14.28 

14.22 

1.19 

1.00 

0.84 

0.926 

2.778 

0.163 

3.867 

TEST 

DATA 

3.51 

13.28 

3.51 

0.26 

3.78 

1.00 

NA 

NA 

NA 

NA 



1) HEALTHY 

2) SLG-AG 

3) SLG-BG 

4) SLG-CG 

5) DLG-ABG 

6) DLG-BCG 

7) DLG-CAG 

8) 3L-ABC 

9) TEST DATA 


Figure 4. 3D plot of three phase PCDI values for training (ten different types of faults and healthy condition) 

and test data 



2) SLG-AG 

3) SLG-BG 

4) SLO-CO 

5) DLG-ABG 

6) DLG-BCG 

7) DLG-CAG 

8) 3L-ABC 

9) TEST DATA 


Figure 5. 3D plot of the three phase Ratio Indices for seven different types of faults (DL fault excluded) 

and test data 


It is further observed that since the unaffected phase is least disturbed in case of a DL fault, 
accordingly indicated in the corresponding PCDI values, hence, the Ratio index of any one of the ratios is 
abruptly high for the DL faults. This is readily observed from Table 1 that, e.g., PCDI-C for DL-AB fault is 
very much low since phase C is the unaffected phase here, which when is used to form [R], Ratio 2 becomes 
abruptly high and this is reflected in [RE] as well as in the Ratio Error Index (REI) so formed and is shown in 
the final column of Table I. It shows that REI is hugely larger for DL faults in comparison to all the other 
prototypes. This key feature is used effectively to identify the DL faults from the rest and the upper threshold 
value £2 is set comparing all other fault types. For the given set of PCDI, it is well observed that ratio error 
index for faults other than DL faults is well below 100 and that for DL faults is way above it. Hence, £2 for 
this case can safely be set at 100. Hence, for the same reasons listed above, DL faults are not included to 
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form fault signatures in Figure 5. It is further observed that even on varying the geometric fault distance from 
the sending end, the PCI vary following a particular pattern as described by the PCDI of Table2 where 
a typical fault data for SLG-BG fault, for example, is taken at different distances 10 km apart all throughout 
the entire span of 150 km long line. More importantly, it is observed that their mutual ratio remains very 
much similar, even with varying geometric distance (km) over the entire span of the transmission line as 
described by the three RI vector values of the same table. The above fact is also represented in Figure 6 
which is constructed using the three phase [PCDI] and [R] values of Table 1 where, as described earlier, 
SLG-B fault is taken for example for different fault locations. 


Table 2. Ratio matrix formed by the PC distances with variation in geometric fault distances 


Fault location(km) 

PCI-A 

PCIB 

PCI-C 

Ratio 1 

Ratio 2 

Ratio 3 

10 

1.4382 

5.0673 

1.438 

0.2838 

3.5239 

0.9999 

20 

2.3356 

8.4395 

2.3345 

0.2767 

3.6151 

0.9995 

30 

2.8854 

10.64 

2.8834 

0.2712 

3.6902 

0.9993 

40 

3.2509 

12.159 

3.2484 

0.2674 

3.7432 

0.9992 

50 

3.5135 

13.282 

3.511 

0.2645 

3.7828 

0.9993 

60 

3.7141 

14.158 

3.7119 

0.2623 

3.8141 

0.9994 

70 

3.8766 

14.872 

3.8754 

0.2607 

3.8375 

0.9997 

80 

4.0063 

15.442 

4.0055 

0.2594 

3.8552 

0.9998 

90 

4.1147 

15.934 

4.1137 

0.2582 

3.8734 

0.9998 

100 

4.2109 

16.378 

4.2098 

0.2571 

3.8905 

0.9998 

110 

4.2983 

16.783 

4.2976 

0.2561 

3.9051 

0.9998 

120 

4.3756 

17.154 

4.3748 

0.2551 

3.9211 

0.9998 

130 

4.45 

17.502 

4.4498 

0.2543 

3.9333 

1 

140 

4.526 

17.832 

4.5269 

0.2538 

3.9391 

1.0002 


Comarison of variation of PCI for Phase A. B and C 
and Ratio Index for variable geometric distances 


-B- ~ PCI-A 
. — * — Ratio 1 





CT 

_■— 1-< 

»--a--l 


->-4 


0 20 40 60 80 100 120 140 

Geometric Distance (km) 



PCI-C 

—- Ratio 3 


i_ a. a — a—a-- a— »—\ 


. ^ 


)l_i_i_i_i_i_i_I 

0 20 40 60 80 100 120 140 

Geometnc Distance (km) 


Figure 6. Variation of three phase PCDI and ratio indices with different geometric fault distances 


It is well observed form Figure 6 that the variations of [R] values are remarkably lesser than that of 
[PCDI values for different geometric distances over the line span for all the three phases, rightly justifying 
the role of [R] and its usefulness in finding out the unknown fault pattern, therefore, is taken as 
the Fundamental Governing Factor in the determination of the unknown fault pattern. This conclusion is also 
justified from Table3 which shows the results of several unknown fault patterns for faults conducted at 
various distances along the 150 km line span and classifier accuracy is 100% using the proposed PCA-Ratio 
based classifier algorithm. 
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Table 3. Fault classifier results with only one set of training data 


Fault type 

PURE 

AG 

BG 

CG 

AB 

BC 

CA 

ABG 

BCG 

CAG 

ABC 

PURE 

13 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

AG 

0 

13 

0 

0 

0 

0 

0 

0 

0 

0 

0 

BG 

0 

0 

13 

0 

0 

0 

0 

0 

0 

0 

0 

CG 

0 

0 

0 

13 

0 

0 

0 

0 

0 

0 

0 

AB 

0 

0 

0 

0 

13 

0 

0 

0 

0 

0 

0 

BC 

0 

0 

0 

0 

0 

13 

0 

0 

0 

0 

0 

CA 

0 

0 

0 

0 

0 

0 

13 

0 

0 

0 

0 

ABG 

0 

0 

0 

0 

0 

0 

0 

13 

0 

0 

0 

BCG 

0 

0 

0 

0 

0 

0 

0 

0 

13 

0 

0 

CAG 

0 

0 

0 

0 

0 

0 

0 

0 

0 

13 

0 

ABC 

0 

0 

0 

0 0 
Overall accuracy: 

0 

100% 

0 

0 

0 

0 

13 


6. FAULT DISTANCE ESTIMATION 

The later and another vital section of the proposed research is prediction of the fault location. 
The proposed fault distance predictor algorithm is designed using multiple linear regression (MLR) analysis. 
MLR takes into account the trends and curvatures of more than one data set and effectively compute one 
primary direction of variation using the multiple data set. The proposed work utilizes this important feature 
of MLR and uses the three phase features in terms of PCDI to form one key curvature, incorporating 
the features of all the PCDI. For this purpose, six intermediate non-equidistant locations at 10, 20, 50, 90, 
130, and 140 km distance from the sending end of the 150 km long line have been chosen as the six training 
points for the proposed fault localizer algorithm. Ten different types of faults have been conducted at these 
six training locations and receiving end current waveforms have been recorded as the training data, each of 
which is fed to undergo the proposed fault classifier algorithm discussed in the previous section and the three 
phase PCDI are found for each of the six training points. This 3D training data set for each fault prototype is 
saved as a look up table and is scaled to unity for generalization and providing uniformity. Hence, 
the training data matrix, for each fault pattern takes the dimension of 6x3, called as training distance PCDI 
matrix afterwards and is given by D/ as: 

D,=[PCDI-A„ PCDI-B,, PCDI-C„] 6 x3 

where, i=l to 10 define each of the ten training fault prototypes mentioned before and j= 1 to 6 defines the six 
training geometric distances at 10, 20, 50, 90, 130, and 140 km respectively. Hence for the ten types of faults, 
there are ten such training distance PCDI matrices, together which forms the total training distance PCDI 
matrix given by Dtrainiing as: 

DtRAINIING=[Di Dj D 3 .... Dio]6x30 

Post classification of the fault, the test PCDI matrix Q as found in the earlier section is saved. 
Next the D, matrix corresponding to the particular identified type with index /is taken up from Dtrainiing, 
followed by interpolation of the test Q vector from the corresponding Di using the Multiple Linear 
Regression (MLR) method in order to predict the geometric distance of the corresponding fault. 


7. CASE STUDY AND ANALYSIS 

A case study is shown here with SLG-A fault. The variation of receiving end line currents with 
varying geometric fault distance for SLG-A fault is shown in Figure 7. The same data is processed through 
the PCA algorithm to produce [PCDI] and consequence calculations. Table4 describes the absolute PCDI 
values and the corresponding scaled values for SLG-A fault at six training locations. The D SLG-A matrix is 
formed using the PCDI values as recorded in Table 4 using values from column 2, 3, and 4. 

Similarly, D scaled SLG-A matrix is formed using values from column 5, 6, and 7 which on plotting 
against the respective fault geometric locations, reveal a curvilinear nature as shown in Figure 8. It is 
observed that each of the fault types show difference in curvature for three individual phases. Hence, 
the proposed scheme has been designed with multiple linear regression (MLR) for each prototype 
individually, which takes into account all the three phase PCDIs to produce a fairly accurate estimate of the 
fault location. The mathematical analysis of the MLR scheme adopted here is explained first following its 
application in designing the fault location prediction algorithm [23], 
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Figure7. Receiving end line current vs. sampled time plot for different geometric fault locations for SLG-A fault 


Table 4. PCDI for A phase to ground fault at six different locations 


Fault location 
(km) 

PCDI 
Phase A 

Phase B 

Phase C 

PCDI (scaled) 

Phase A Phase B 

Phase C 

10 

7.449 

1.6804 

1.6816 

0 

0 

0 

20 

11.2046 

2.4365 

2.4365 

0.3421 

0.3034 

0.3039 

50 

14.9931 

3.2307 

3.2307 

0.6873 

0.6221 

0.6223 

90 

16.7925 

3.7172 

3.7172 

0.8512 

0.8174 

0.8172 

130 

18.1095 

4.0781 

4.0781 

0.9712 

0.9622 

0.9622 

140 

18.4261 

4.1723 

4.1723 
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1 
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Figure 8. Geometric fault distance vs. PCDI (scaled) plot for three phase receiving end line currents for 

SLG-A fault at six training locations 


8. APPLICATION OF MULTIPLE LINEAR REGRESSION (MLR) 

Principal component analysis (PCA) as explained so far, itself is an important and effective tool in 
order to reduce a large number of multivariate data to a few primary directions of major variation. The three 
different phases of PCDI have difference in curvature which is well observed from Figure 10. This is further 
extended for all ten different fault patterns. The three phase PCDI for each pattern, is processed by 
the proposed MLR based scheme to achieve a single computed direction of variation, taking into account all 
the three curvatures from the three phases which is finally taken as the training data for the proposed fault 
distance predictor algorithm. Regression analysis is an important statistical tool to determine the relationship, 
called the regression function, between a dependent variable ‘y’, and a single or several independent 
variables ‘xi’. Regression function also involves a set of unknown parameters ‘bi’, called the regression 
coefficients. A simple linear regression model is described as: 
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Y=bo+bi Xl (1) 

Linear regression models with multiple independent variables are referred to as multiple linear models, 
a model of such representation is given as: 

y=bo+biXi+b2X2+b3X3+...+b n x n (2) 

where n is the total number of independent variables. 

The proposed algorithm uses scaled PCDI for the three phases as the input variable Xj. Figure 10 
reveals that the three phase scaled PCDI shows a curvilinear nature, rather than a straight line trend. Hence, 
to take into account this curvature of PCDI, the proposed algorithm is extended to multiple orders of these 
primary inputs xi depending on the Minimum Square Error (MSE) criteria. It is also to be mentioned 
here that the no of independent variables have clearly been taken depending upon the MSE criteria, and more 
so, the number of such variables vary from one fault type to another and also the order and type 
and interdependence, if any, among the variables. Thus, the proposed scheme is constructed as: 

The primary input variable is defined as, 

Xi=D sca led i=[PCDI-A SC alediPCDI-B s calediPCDI-Cscaledi] 6 x 3 (3) 

and the elements are ordered as, 

Xi = [xi 11 XI 12 XI 13; XI 21 XI 22 Xi 23 ;. Xi 61 Xi 62 Xi 63 ] 6 x 3 (3a) 

where D scaled i the training matrix of any particular type of fault containing three phase PCDI 
corresponding to six different training locations, hence taken as the primary input variables and i takes 
the index of the fault class identified by ratio analysis. Further, MLR has been adopted here with multiple 
inputs of several orders of the primary input defined by X1. Hence, the complete regression equation for one 
training pattern takes the form: 

Y=XiBi+X 2 B 2 +.+X k B k (4) 

where Y is the output vector, and in the proposed case, Y is formed by the six training geometric fault 
locations, defined as: 

Y=[yi y 2 y3y4 ys yeFexi (5) 

where yi, yi,...,y6 etc. takes the training fault locations taken as 10, 20, 50, 90, 130, and 140 respectively which 
is fixed for the proposed scheme and X k is the k-th order polynomial expression of the primary input Xi, i.e. 

X k =[Xi] k 6x3 (6) 

The idea is to train the proposed MLR based fault localization algorithm with the best fit 
arrangement, taking together all the three phases, although the maximum variation occurs in case of 
the directly affected line. The maximum order of Xii.e., index k has been assumed 12, i.e., twice the number 
to training locations, only to reduce computational complexity and the intermediate orders, i.e., the values 
of index k is set according to the MSE criteria. Thus, the complete input matrix is described by, 

[X]=[Xi X 2 .X k ] 6 x 3k (6a) 

and the coefficient matrix Bi for each variable Xi obtained on regression analysis as 1x3 vectors described by, 

Bi= [bubizbalV ( 7 ) 

and the complete coefficient matrix for each training pattern is defined by B as: 

B=[Bi; B 2 ;.B k ]3 kx i ™ 

= [bllbl2 bl3t b21 b22 b23; ... b k lb k 2 b k 3] T 3 k xl 

which is a 3kxl vector. In general, the coefficients of B are described as: 

B=[bi b 2 b 3 ... b 3k ] T 3k xi (9) 

The maximum number of input variables and each particular order are different for each ten training 
patterns and have been chosen depending in MSE criterion, producing different coefficient matrix for each 
training patterns. In a word, the non-linear nature of the PCDI has been scaled using MLR analysis. 
Equations (4) to (9) describe the MLR analysis for each fault pattern only, the complete equation of which 
can be given in matrix for as obtained from (4) as follows: 
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Y=X B (10) 

In order to estimate the regression coefficients, a least square approach has been adopted; thus, the 
algorithm minimizes each of the errors described as: 

e.=S(yi-biXii-b 2 Xi2-b3X l3 -.-b 3n Xik) (11) 

which is found with all possible training values and this is minimized by setting 

[B]=([X] T [X]y [ ([X] T [Y]) (12) 

where([X] T [X])as well as([X] T [X]) _1 are kxk dimensional symmetric matrix and ([X] T [Y]) is a k dimensional 
vector. Hence the fitted values are, 

[Y]=m LB] (13) 

and the residuals are given by, 

[R]=m-[Y] (i4) 


These residuals have been minimized following MSE criteria and the corresponding orders 
of polynomials for each type of training set has been achieved and stored in a look up table. Thus, each 
training pattern has different B vector having difference both in magnitudes, as well as in dimensionality. 
In order to test any unknown fault current, the proposed fault classifier algorithm based on the ratio analysis 
is applied first to identify the exact type of fault followed by fault distance prediction analysis using MLR as 
described.The three phase PC Indices corresponding to the experimental waveform have been analyzed using 
the same location prediction algorithm using the regression coefficient matrix (B) corresponding to 
the exact predicted fault type, as determined by the classifier, and the predicted location has been derived. 
The proposed algorithm is described in Figure 9 in the form of flowchart. 



Figure 9. Flowchart illustrating fault location predictor algorithm 
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9. RESULTS OF FAULT LOCATIONS PREDICTOR 

Table5 shows a summary of results by the proposed fault location predictor algorithm for ten different 
types of faults occurred at different locations, which shows that the proposed scheme produces an average 
deviation of 0.0871 km for the 150 km long transmission line which is well beyond satisfactory margin. 


Table 5. Results showing the fault distance predictor algorithm performance with varying fault location 


Type of fault 

Average deviation 

Average 


with all distances (km) 

deviation (km) 

SLG-A 

0.1409 


SLG-B 

0.1045 


SLG-C 

0.0113 


DL-AB 

0.1301 


DL-BC 

0.0391 

0.0871 

DL-CA 

0.1786 

DLG-AB 

0.0287 


DLG-BC 

0.0447 


DLG-CA 

0.12 


LLL 

0.0736 



10. CONCLUSION 

A simple and effective power system protection scheme for classification and distance prediction 
of long transmission line has been proposed here for a single end fed 400 kV, 50 Hz, 150 km long radial 
transmission line. Principal component analysis and multiple linear regression analysis has been adopted 
here to realize, design and implement the proposed protection scheme. Quarter cycle pre-fault and half cycle 
post-fault receiving end three phase fault current waveforms have been fed as the only input to the algorithm. 
PCA scores thus computed analyzing the input data have been used to construct principal component 
distance indices (PCDI) which are used to develop a ratio based algorithm to identify and classify faults. 
Results show that the classifier shows 100% accuracy using only one set of training data taken almost at 
the midpoint of the line. 

Thus the low training data is one of the key features of the proposed fault classifier. The scheme 
used PCA based analysis only instead of ANN aor Wavelet transform based approaches. ANN requires large 
training data and hence the training time is also very high. Wavelet analysis, on the other hand is 
computationally heavily burdened. Most of the other methods too have further complex analysis, which 
require higher time of computation. Simplicity of the scheme compared to some other existing methods 
and less computation time are other key features of the scheme. The proposed protection scheme is further 
extended to develop fault localizer algorithm. The average deviation of predicted fault location is only about 
87.1 m. Hence, the proposed algorithm has high accuracy in determining power system fault locations as 
well. Accurate fault localization helps the personnel to identify the fault point fast and saves valuable time 
and effort to restore normal operation at the earliest. Thus the proposed protection scheme has all 
the qualities for the development of reliable transient-based power system protection unit. 
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